Bibliography
191
[164] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In Pro-
ceedings of the International Conference on Learning Representations, pages 1–18,
2017.
[165] Ziyang Luo, Artur Kulmizev, and Xiaoxi Mao. Positional artefacts propagate through
masked language model embeddings. arXiv preprint arXiv:2011.04393, 2020.
[166] X. Ma, P. Zhang, S. Zhang, N. Duan, Y. Hou, D. Song, and M. Zhou. A tensorized
transformer for language modeling. In Advances in Neural Information Processing
Systems, 2019.
[167] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning
models resistant to adversarial attacks. In ICLR, 2017.
[168] Brais Martinez, Jing Yang, Adrian Bulat, and Georgios Tzimiropoulos.
Train-
ing binary neural networks with real-to-binary convolutions.
arXiv preprint
arXiv:2003.11535, 2020.
[169] Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei
Sun, and Jingdong Wang. Conditional detr for fast training convergence. In Proceed-
ings of the IEEE/CVF International Conference on Computer Vision, pages 3651–
3660, 2021.
[170] Xiangming Meng, Roman Bachmann, and Mohammad Emtiyaz Khan. Training bi-
nary neural networks using the bayesian learning rule. In International conference on
machine learning, pages 6852–6861. PMLR, 2020.
[171] D Messerschmitt. Quantizing for maximum output entropy (corresp.). IEEE Trans-
actions on Information Theory, 17(5):612–612, 1971.
[172] Paul Michel, Omer Levy, and Graham Neubig. Are sixteen heads really better than
one? Advances in neural information processing systems, 32, 2019.
[173] Luca Mocerino and Andrea Calimera. Tentaclenet: A pseudo-ensemble template for
accurate binary convolutional neural networks. In 2020 2nd IEEE International Con-
ference on Artificial Intelligence Circuits and Systems (AICAS), pages 261–265. IEEE,
2020.
[174] Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of bayesian
methods for seeking the extremum. Towards global optimization, 2(117-129):2, 1978.
[175] Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing
magazine, 13(6):47–60, 1996.
[176] Jean-Jacques Moreau. Proximit´e et dualit´e dans un espace hilbertien. Bulletin de la
Soci´et´e math´ematique de France, 93:273–299, 1965.
[177] Matthias Mueller, Neil Smith, and Bernard Ghanem. A benchmark and simulator for
uav tracking. In Computer Vision–ECCV 2016: 14th European Conference, Amster-
dam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 445–461.
Springer, 2016.
[178] Prasanna Kumar Muthukumar and Alan W Black. A deep learning approach to data-
driven parameterizations for statistical parametric speech synthesis. arXiv preprint
arXiv:1409.8558, 2014.